[data] Slice output blocks to respect target block size #40248

stephanie-wang · 2023-10-11T00:08:51Z

Why are these changes needed?

This slices a task's output blocks to ensure that we respect the target max block size. This can cause a performance penalty for cases where the batch size is misaligned with the output block size, but this is necessary for stability and can be optimized later (by auto-choosing a better batch size).

Related issue number

#40026.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

raulchen

Nice fix. Have you run the benchmarks? Would like to learn the perf impact.

stephanie-wang · 2023-10-11T21:49:42Z

Nice fix. Have you run the benchmarks? Would like to learn the perf impact.

Good idea, will do this.

stephanie-wang · 2023-10-12T00:49:25Z

Did some spot checks on the single-node performance benchmarks, and seems like there's on obvious difference.

…project#40248)" This reverts commit d5f1eed.

#40248 changed output block creation so that when a task produces its output blocks, it will try to slice them before yielding to respect the target block size. Unfortunately, all-to-all ops currently don't support dynamic block splitting. This means that if we try to fuse an upstream map iterator with an all-to-all op, the all-to-all task will still have to fuse all of the sliced blocks back together again. This seems to increase memory usage significantly. This PR avoids this issue by overriding the upstream map iterator's target block size to infinity when it is fused with an all-to-all op. This also adds a logger warning for how to workaround. Related issue number Closes #40518. --------- Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

ray-project#40248 changed output block creation so that when a task produces its output blocks, it will try to slice them before yielding to respect the target block size. Unfortunately, all-to-all ops currently don't support dynamic block splitting. This means that if we try to fuse an upstream map iterator with an all-to-all op, the all-to-all task will still have to fuse all of the sliced blocks back together again. This seems to increase memory usage significantly. This PR avoids this issue by overriding the upstream map iterator's target block size to infinity when it is fused with an all-to-all op. This also adds a logger warning for how to workaround. Related issue number Closes ray-project#40518. --------- Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

#40248 changed output block creation so that when a task produces its output blocks, it will try to slice them before yielding to respect the target block size. Unfortunately, all-to-all ops currently don't support dynamic block splitting. This means that if we try to fuse an upstream map iterator with an all-to-all op, the all-to-all task will still have to fuse all of the sliced blocks back together again. This seems to increase memory usage significantly. This PR avoids this issue by overriding the upstream map iterator's target block size to infinity when it is fused with an all-to-all op. This also adds a logger warning for how to workaround. Related issue number Closes #40518. --------- Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

This addresses #40759 and #38400 for the 2.8 release branch. This change OR reverting #40248 seems to fix #40759, but the root cause has not been identified yet. For #38400, we will merge a longer-term fix to master for 2.9. This PR should be safe since it reverts Data block size back to the 2.7 default. Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

With #40248, block sizes are now respected. This increases the default shuffle block size to 1GiB, which restores the previous behavior in the release test dataset_shuffle_sort_1tb. There is a possibility that this increases worker heap memory pressure during shuffle operations, but it can be resolved by overriding DataContext. Related issue number Closes #38400. --------- Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

With ray-project#40248, block sizes are now respected. This increases the default shuffle block size to 1GiB, which restores the previous behavior in the release test dataset_shuffle_sort_1tb. There is a possibility that this increases worker heap memory pressure during shuffle operations, but it can be resolved by overriding DataContext. Related issue number Closes ray-project#38400. --------- Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

Slice output blocks

09acae8

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

stephanie-wang marked this pull request as ready for review October 11, 2023 00:08

stephanie-wang requested review from ericl, scv119, c21, amogkam, scottjlee, bveeramani and raulchen as code owners October 11, 2023 00:09

fix unit test

f45ffe0

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>

raulchen approved these changes Oct 11, 2023

View reviewed changes

stephanie-wang merged commit d5f1eed into ray-project:master Oct 12, 2023
89 of 121 checks passed

stephanie-wang added a commit to stephanie-wang/ray that referenced this pull request Oct 20, 2023

Revert "[data] Slice output blocks to respect target block size (ray-…

124e712

…project#40248)" This reverts commit d5f1eed.

This was referenced Oct 20, 2023

Release test dataset_shuffle_random_shuffle_1tb.aws failed #40518

Closed

[data] Disable block slicing for shuffle ops #40538

Merged

stephanie-wang mentioned this pull request Oct 23, 2023

[data] Disable block slicing for shuffle ops (#40538) #40602

Merged

8 tasks

This was referenced Oct 28, 2023

[data] benchmark regression iter_tensor_batches_benchmark_multi_node #40759

Closed

[data] Revert default block size back to 512MiB #40765

Merged

This was referenced Oct 31, 2023

Release test dataset_shuffle_sort_1tb.aws failed #38400

Closed

[data] Update default shuffle block size to 1GB #40839

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] Slice output blocks to respect target block size #40248

[data] Slice output blocks to respect target block size #40248

stephanie-wang commented Oct 11, 2023

raulchen left a comment

stephanie-wang commented Oct 11, 2023

stephanie-wang commented Oct 12, 2023

[data] Slice output blocks to respect target block size #40248

[data] Slice output blocks to respect target block size #40248

Conversation

stephanie-wang commented Oct 11, 2023

Why are these changes needed?

Related issue number

Checks

raulchen left a comment

Choose a reason for hiding this comment

stephanie-wang commented Oct 11, 2023

stephanie-wang commented Oct 12, 2023